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Abstract 
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Spatio-spectral  properties  of  the  Wavelet  Transform  provide  a  useful  theoret¬ 
ical  framework  to  investigate  the  structure  of  neural  networks.  A  few  researchers 
(Pati  &  Krishnaprasad,  Zhang  &;  Benveniste)  have  investigated  the  connection 
between  neural  networks  and  wavelet  transforms.  However,  a  number  of  issues 
remain  unresolved  especially  when  the  connection  is  considered  in  the  multi¬ 
dimensional  case.  In  our  work,  we  resolve  these  issues  by  extensions  based  on 
some  theorems  of  Daubechies  related  to  wavelet  frames  and  provide  a  frame¬ 
work  to  analyze  local  learning  in  neural-networks. 


We  also  provide  a  constructive  procedure  to  build  networks  based  on  wavelet 
theory.  Moreover,  cognizant  of  the  problems  usually  encountered  in  practical 
implementations  of  these  ideas,  we  develop  a  heuristic  methodology,  inspired  by 
similar  work  in  the  area  of  Radial  Basis  Function  (RBF)  networks  (Moody  & 
Darken,  Platt),  to  build  a  network  sequentially  on-line  as  well  as  off-line. 

We  show  some  connections  of  our  method  to  some  existing  methods  such 
as  Projection  Pursuit  Regression  (Friedman),  Hyper  Basis  Functions  (Poggio  & 
Girosi)  and  other  methods  that  have  been  proposed  in  the  literature  on  neural- 
networks  as  well  as  statistics.  In  particular,  some  classes  of  wavelets  can  also  be 
derived  from  the  regularization  theoretical  framework  given  by  Poggio  &  Girosi. 

Finally,  we  choose  direct  nonlinear  adaptive  control  to  demonstrate  the  util¬ 
ity  of  the  network  in  the  context  of  local  learning.  Stability  analysis  is  carried 
out  within  a  standard  Lyapunov  formulation.  Simulation  studies  show  the  ef¬ 
fectiveness  of  these  methods.  We  compare  and  contrast  these  methods  with 
some  recent  results  obtained  by  other  researchers  using  Back  Propagation  (Feed- 
Forward)  Networks,  and  Gaussian  Networks. 
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Chapter  1 


Introduction 


‘Learning’  can  be  viewed  as  providing  an  approximation  to  a  desired  mapping 
within  a  given  tolerance.  Prom  an  electrical  engineering  perspective,  interest  in 
theories  of  learning  arises  owing  to  the  presence  of  many  systems  that  are  un¬ 
known  or  only  partially  known  and  are  difficult  to  model.  In  such  situations  the 
mapping  needs  to  be  implemented  from  observations  during  interactions  with 
the  system.  To  solve  this  problem,  researchers  in  several  disciplines  have  de¬ 
veloped  tools  that  can  be  graphically  interpreted  as  ‘networks’.  Although  these 
tools  initially  derived  some  inspiration  from  biological  observations,  approxima¬ 
tion  theory  and  statistical/information-theoretic  methods  have  been  recognized 
as  essential  tools  to  tackle  the  enormous  complexity  inherent  in  the  method. 
Reflecting  this  diversity  of  disciplines,  and  depending  on  the  application  do¬ 
main,  such  networks  are  often  known  variously  as  ‘neural  networks’,  ‘statistical 
networks’,  ‘connectionist  networks’  and  ‘biological  networks’. 

In  spite  of  the  explosive  growth  of  research  in  this  area  in  recent  years,  the 
methodology  has  largely  remained  heuristic;  precise  mathematical  methods  are 
often  difficult  to  derive  or  when  derived,  remain  largely  without  any  practical 
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merit.  As  a  result,  tools  that  can  provide  more  insight  into  their  structure  and 
‘de-mystify’  them  are  important.  One  finds  such  a  tool  in  wavelets  and  in  this 
thesis  we  focus  our  attention  on  how  well  this  tool  can  help  in  this  task. 

First  we  have  to  address  the  issue  of  learning  from  the  viewpoint  of  approx¬ 
imation  theory.  For  this  purpose,  we  use  the  theory  of  wavelets  with  wavelet 
as  an  alternative  to  implementing  local  learning  with  Gaussian  Radial  Basis 
Functions(GRBF).  In  a  related  spirit,  Poggio  and  Girosi  [27]  showed  that  Ra¬ 
dial  Basis  Function  Networks  can  be  derived  from  Regularization  Theory  and 
Pati  and  Krishnaprasad  [25,  23]  have  shown  that  feed-forward  neural  networks 
can  be  considered  within  the  framework  provided  by  discrete  wavelet  transform 
theory.  Zhang  and  Benveniste  [34]  give  a  somewhat  different  treatment  of  this 
connection  between  neural  networks  and  wavelet  transforms.  However,  adequate 
treatment  of  theoretical  issues  (e.g.,  the  construction  of  wavelet  frames,  method 
of  dilating,  bounds  on  error  in  approximation  using  a  finite  subset  of  dilations 
and  translations)  in  high  dimensional  problems,  or  practically  feasible  ways  to 
tackle  the  ubiquitous  problem  of  ‘curse  of  dimensionality’  are  not  available  in 
any  related  literature.  This  posed  one  of  the  two  major  motivations  for  this 
thesis;  the  other  motivation  will  become  clear  in  the  course  of  this  chapter. 

The  theory  of  multi-dimensional  approximation  using  wavelets  is  developed  in 
chapter  2.  In  particular,  we  extend  the  sufficient  conditions  given  by  Daubechies 
for  1-D  wavelet  frames  to  the  multi-dimensional  case  in  two  different  ways  (i.e. , 
using  single  and  multiple  dilation  parameters)  and  show  that  frames  can  be 
constructed  from  a  single  mother  wavelet.  In  the  first  case  we  can  construct 
radial  wavelet  frames,  and  in  the  second  case,  the  tensor  product  construction 
leads  to  valid  frames. 
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The  fact  that  wavelets  are  local  functions  suggests  that  learning  in  wavelet 
networks  will  face  many  of  the  dilemmas  faced  with  local  learning  networks 
such  as  the  RBFNs,  local  polynomial  fitting,  etc.  A  major  problem  in  all  these 
cases  is  of  course  the  ‘curse  of  dimensionality’,  which  is  well  known  across  disci¬ 
plinary  boundaries:  if  one  were  to  adhere  strictly  to  the  mathematical  theory,  the 
number  of  network  ‘units’  needed  becomes  so  excessive  as  to  render  the  theory 
meaningless  in  practice.  This  realization  has  led  even  mathematicians  schooled 
in  rigorous  theory  to  search  for  useful  approximate  or  heuristic  methods  to  solve 
complex  real-world  problems  (see,  for  instance,  Friedman  [13]  and  the  discussion 
that  followed  it). 

In  chapter  3,  we  use  the  theoretical  results  in  chapter  2  to  develop  wavelet- 
based  networks,  and  then  examine  suitable  heuristic  procedures  to  make  the 
proposed  methods  practically  more  relevant. 

Chapter  4  addresses  the  connection  of  the  methods  proposed  in  previous 
chapters  to  existing  methods  in  diverse  literature.  Given  the  diversity  and  gen¬ 
erality  of  neural  network  methods,  it  is  not  surprising  that  several  statistical 
methods  that  have  existed  independently  in  statistics  have  been  brought  into 
‘neural’  framework  in  recent  years.  In  that  spirit,  we  discuss  the  connection  of 
the  wavelet  methodology  to  existing  neural  network  methods  and  other  methods 
such  as  Projection  Pursuit  Regression  [11,  12],  and  Regularization  Theory[27]. 
In  particular,  by  simple  modifications,  we  show  that  symmetric  wavelets  can  be 
derived  as  special  classes  of  the  regularization  network  functionals. 

Neural  networks  are  in  general  applicable  in  a  broad  range  of  areas  that  in¬ 
clude  computer  vision/image  processing,  adaptive  signal  processing  (filtering, 
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prediction)  and  control.  Many  of  the  theoretical  ideas  that  originated  from  con¬ 
trol  theorists  have  found  their  way  into  adaptive  signal  processing.  A  detailed 
view  of  the  two  areas  can  be  obtained  from  Widrow  and  Stearns  [33],  and  Astrom 
and  Wittenmark  [1].  Neural  networks  in  control  problems  pose  more  challenges 
because  of  the  need  to  prove  stability,  error  convergence,  etc.  in  a  rigorous  fash¬ 
ion.  Moreover,  neural  networks  can  strengthen  the  existing  connection  between 
adaptive  control  and  adaptive  signal  processing  and  bring  them  closer.  This  pro¬ 
vided  a  parallel  motivation  for  this  thesis.  The  utility  of  the  proposed  methods 
are  therefore  investigated  in  adaptive  control  problems. 

The  literature  on  adaptive  control  and  signal  processing  abound  with  tech¬ 
niques  derived  from  linear  systems  or  local  linearization.  What  makes  neural- 
networks  attractive  is  their  ability  to  solve  non-linear  problems  in  signal  process¬ 
ing  and  control.  As  a  result,  in  recent  years,  adaptive  control  strategies  using 
neural-networks  have  been  investigated  by  a  number  of  researchers:  Narendra 
and  Parthasarathy  [19,  20],  Chen  and  Khalil  [3],  Sanner  and  Slotine  [30],  to 
name  a  few.  These  researchers  have  looked  at  either  Multi-layer  feed-forward 
networks  or  GRBFNs.  Sanner  and  Slotine  [30]  also  recognize  the  possibility  of 
using  wavelets  instead  of  GRBFs,  but  to  our  knowledge  a  rigorous  theory  for  us¬ 
ing  wavelets  in  a  multi-dimensional  network  has  not  been  developed  in  any  work 
that  pre-dates  this  work.  In  chapter  5,  we  formulate  adaptive  control  problems 
and  show  how  the  wavelet  network  can  be  used  in  such  situations. 

Chapter  6  deals  with  results  from  simulation  studies,  the  conclusions  drawn, 
and  future  directions. 
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Chapter  2 


Multidimensional  Wavelets  and 
Function  Approximation 


2.1  Wavelet  Transform 


Suppose  0  G  L2(R)  satisfies  the  following  admissibility  condition: 

12 


L 


-du  <  oo. 


M 


Then  dilations  and  translations  of  the  function  0  can  be  used  to  capture 
the  ‘local  character  in  space  and  frequency’  of  an  arbitrary  function  /  G  L2(R). 
This  property  is  captured  in  the  following  relations  of  the  continuous  wavelet 
transform  (for  a  detailed  analysis  of  this  theory  the  reader  is  referred  to  [8])  . 


,.o?rr  <£<*>*»* 

J  — oo  J — oo  CL  * 


o,.b  -s.  nj.d.b 


-oo  J  — oo  CL 

where  0a’6(x)  =  |a|_20(^),  and  C ^  is  given  by 


C*  =  2tt  f 

J  — OO 


00  duj\Lj){io)\2 


M 


5 


When  one  goes  from  the  continuous  case  to  the  discrete  case,  the  notion  of 
frames  becomes  necessary.  A  formal  definition  of  frames  is  given  below. 

Definition 

Given  a  Hilbert  Space  H,  and  a  sequence  of  vectors  {hn}nez,  C  H,  {hn}  is  said 
to  constitute  a  frame  for  T-L  if  3  two  constants  A  >  0  and  B  <  oo,  such  that 
V/  G  W,  the  following  inequalities  hold: 


m\2<z\<hnj>\2<B\w. 

n 

When  the  frame  condition  is  satisfied,  one  can  define  a  frame  operator  S  as 

Sf  =  J2  <  f,hn>  hn 

n 

and  decompose  the  function  /  as 

/  =  J2w^hn 

n 

where  wn  =<  /,  >.  The  Hilbert  space  we  consider  is  L2( Rn),  the  space 

of  square-integrable  functions  over  Rn.  In  the  following  we  will  consider  wavelet 
frames.  Daubechies  has  given  sufficient  conditions  for  1-D  wavelet  frames.  Here 
we  extend  these  conditions  to  the  multi-dimensional  case  in  the  case  of  joint 
dilations  in  all  dimensions  as  well  as  separate  dilations  and  translations  in  each 
dimension. 

2.2  Single-scaling  wavelet  frame 

In  this  section  we  show  that  it  is  possible  to  build  single-scaling  multi-dimensional 
wavelet  frames  by  using  a  single  mother  wavelet.  For  this  purpose  we  generalize 
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Daubechies’  theorem  on  sufficient  conditions  of  wavelet  frame  [7]  to  the  single¬ 
scaling  multi-dimensional  case. 


Theorem  1  Let  ip  £  L2(R”).  Consider  a  family  of  dilated  and  translated  func¬ 
tions  of  the  form 


tf(a,  b)  =  {ipi,k(x)  =  a~l*nlil)(a-lx  -  bk)  :  l  €  Z,  k  £  Zn}  (2.1) 

w/jere  x  £  Rn,  a,  b  £  R  and  a  >  1.  If  the  following  three  conditions  (2.2),  (2.3) 
and  (2-4)  are  satisfied 


m(ip,  a) 


=  ,?ss  inf  |'0(a^)|2  >  0 


IklleU, a]  ^ 


(2.2) 


mom 


A  ess  swpy  U(alu)\2  <  oo 
IMIeM 


sup 

veRn 


(1  +  i)T  T])n^1+t^2  fi{rf)  —  Cc  <  oo  for  some  e  >  0 


(2.3) 

(2.4) 


where 


P{v)  =  sup  52\$(alw)\-\fi>(alu  +  rj)\  (2.5) 

ll-l|6[l,a]  ie z 

then  there  exists  b0  >  0  such  that  Vb  £  (0,  b0),  the  family  \I/( a,b )  in  (2.1)  con¬ 
stitutes  a  frame  of  L2( Rn),  in  other  words,  there  exist  two  constants  A  >  0  and 
B  <  +oo,  such  that  V/  £  L2(Rn),  the  following  inequalities  hold 


-4||/IP<£KW)I2<B||/II2 

l,k 

where  the  sum  ranges  are  l  £  Z  and  k  £  Z",  (•,  •)  denotes  the  inner  product  in 
L2{  R").  □ 


Note  that  for  the  family  ^(a,b)  defined  by  (2.1),  the  dilation  index  l  is  a 
scalar,  and  the  scalar  dilation  parameter  a1  is  shared  by  all  the  dimensions  of  a 
wavelet.  The  proof  of  this  theorem  is  given  in  Appendix  A.l. 
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2.3  Multi-scaling  wavelet  frame 

We  introduce  the  dilation  and  translation  matrices  Dj  and  T  as 

Dj  =  diag  (an ,  •  •  • ,  aJ"  ) 

where 

and 

T  =  diag  (b±,  -  ■  •  ,bn) . 

With  Dj  and  T  thus  defined,  separate  dilation  and  translation  parameters  can 
be  used  in  wavelet  functions.  The  following  theorem  is  an  analog  of  Theorem  1 
in  the  multi-scaling  case. 


Theorem  2  Let  0  €  Z/2(Rn).  F°r  o  €  R,  a  >  1,  fe  =  (6i,  •••,&„)  6  Rn,  and 
bi  >  0,  i  =  1,  •  •  •  ,n,  consider  the  family  of  translated  and  dilated  functions  of 
the  form 

#( a,b )  =  {0j,fc(a;)  =  det  Dfip(DjX  -  Tk )  :  j,  k  G  Zn}. 


m(0,  a) 


A 


ess  inf 


,nje  z" 


and 


M(ip,a)  = 


ess  sup 

K|e[l,a],i=l,-,ra 


E  I $(D-3W)\2  <  00 

je  z" 


sup  [(1  +  ?7T??)n(1+e)/2/?(??)3  =  Ce  <  oo  for  some  e  >  0 

t7£Rn 
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where 


P(v)~  SUP  £  \$(D-jw)\-\$(D-jw  +  v)\, 

|w,|e[l,o],i=l,— ,n  j£zn 

then  there  exists 1  b0  >  0  such  that  V6  G  (0,  60)^  the  family  defined  above  consti¬ 
tutes  a  frame  for  L2(Rn)]  i.e.  ,3  two  constants  A  >  0  and  B  <  oo,  such  that 
V/  e  L2(Rn),  the  following  inequalities  hold 

^ll/ll2  <  £  l«W>l2  <  sil/ll2 


The  proof  of  this  theorem  is  given  in  Appendix  A. 2. 

2.4  Construction  of  wavelet  frames 

We  are  interested  in  a  methodology  that  allows  us  to  construct  the  multi¬ 
dimensional  wavelet  function  leading  to  frames;  i.e.,  the  problem  is  to  find  a 
wavelet  function  that  satisfies,  together  with  its  dilation  and  translation  param¬ 
eters,  the  sufficiency  conditions  outlined  in  the  above  theorems.  In  this  section 
we  first  consider  the  tensor  product  construction  of  multi-scaling  wavelet  frames; 
then  we  discuss  possible  non  product  constructions. 

2.4.1  Tensor  product  frames 

Let  fi>(x)  be  a  tensor  product  of  1-dimensional  wavelet  functions,  i.e., 

=  Vh (xi)  •  •  ■  i’nM  . 

1  abusing  notation,  we  consider  element-wise  bounds  when  we  refer  to  vector  bounds  in  this 
thesis. 
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Then, 


$(w)  =  fewi)-ta- 


ipi(xi),  i  =  1,  •  •  • ,  n,  must  satisfy  the  admissibility  condition: 

f  \lpi(LOi)\2duJi 

/  - \T - <  °°- 

J  Kl 

Under  mild  conditions  of  decay,  this  is  satisfied  if  we  choose  ipi(xi)  such  that 

J  ipi(xi)dxi  =  0. 

If  these  1-dimensional  functions  can  constitute  frames,  they  must  satisfy  the 
first  two  conditions  outlined  in  theorem  2,  as  applied  to  the  1-dimensional  case, 
which  are  necessary  conditions  as  well  [8]. 

Moreover,  Daubechies  [8]  shows  that  in  1-D,  a  single  sufficient  condition  on 
the  decay  of  ^  as  given  by 

|V>i(wj)|  <  Ci\cOi\a(l  +  |o>j|2)_2  with  a  >  0  and  7  >  a  +  1 
is  equivalent  to  the  second  and  third  conditions  of  the  theorem. 

Since  in  practice  this  decay  condition  is  rather  mild,  for  the  purpose  of  con¬ 
struction,  we  assume  that  it  is  satisfied  and  hence  all  conditions  of  the  theorem 
are  satisfied. 

Hence  in  the  multidimensional  case,  by  using  the  inequalities  in  1-D  above, 
and  the  fact  that  the  infimum  and  supremum  can  now  be  taken  over  the  sum  in 
each  dimension,  we  have 


10 


>  0, 


31 


3n 


and 


m»,«)  =  ,  ,sf,sup 

<  00. 


jn 


For  the  third  condition,  we  have  the  following  inequality, 


£  [p(2'trT~1k)P(—2‘7rT~lk)] 1/2  <  ]T(1  +  •  •  • 

jfcJ^O  fci 

£(1  + 

kn 

<  E  |2irJr'*ir(l+*>  •  •  ■  E 

fcl  kn 

Since  each  sum  over  kt  converges,  i  =  1, . . . ,  n,  we  have  that  the  sum  involving 
P  converges.  Moreover,  as  bi  — >  0,  i  =  1,  •  •  • ,  n,  this  sum  tends  to  0.  Hence  all 
conditions  of  the  theorem  are  satisfied. 

Therefore  the  tensor  product  construction  leads  to  valid  frames  of  wavelets. 


2.4.2  Necessary  conditions 

To  make  the  results  complete,  we  are  interested  in  obtaining  necessary  conditions 
as  in  the  1-D  case.  In  particular,  it  would  be  appropriate  to  check  whether  the 
admissibility  condition  for  discrete  wavelet  frames  has  the  same  structure  as 
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continuous  wavelets  in  the  multidimensional  case.  In  the  tensor  product  set  up, 
this  follows  trivially  since  the  1-D  admissibility  conditions  lead  to 


f  ij)(x)dx  —  0. 
J  Rn 


From  recent  extensions  on  the  bounds  for  1-D  case  [5,  7],  the  following  holds 
for  any  frame  (i  =  1,  •  •  ■  ,n  is  the  dimension  index,  ji}  hi  are  dilation  and 
translation  indexes  respectively): 

Qtt  _ 

A  <  -7-  Z  \A(a~j'w)\2  <  Bi . 

Considering  multiplication  of  the  above  inequalities  over  i  =  l,---,n,  we 
have 


A  =  Ax  •  •  •  An  <  (27r)n  det  T^1  ^  $(D_jW) |2  <  Bx  ■  ■  •  Bn  =  B 

j 

In  [5],  this  bound  is  obtained  for  the  case  of  Riesz  bases.  However,  the  proof 
relies  only  on  the  frame  condition,  and  therefore  the  above  inequality  is  general 
in  that  it  holds  for  arbitrary  frames  (not  necessarily  of  the  tensor  product  type). 
Another  problem  is  to  construct  such  arbitrary  frames. 


2.4.3  Non-separable  frames 

The  observation  that  all  conditions  of  the  theorems  on  sufficient  conditions  hinge 
on  the  boundedness  and  decay  of  terms  involving  |'0(-)|  suggests  the  possibility 
of  multiplying  the  tensor  product  wavelet  in  the  frequency  domain  by  a  function 
of  the  form 

P(w)  -  Z)  cie~dTwi  Q  €  R 
le  zn 
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which  can  be  the  Fourier  series  of  a  periodic  function. 

Let  the  new  wavelet  be 

i(w)  =  p{w)$(u) 

where  ip(-)  corresponds  to  the  wavelet  constructed  as  a  tensor  product.  If 


then  the  fact  that 


0  <  EM  <  oo , 

l 


0  <  $p(u)\  <  E|cj|)|$(u/)| 
i 

implies  that  all  conditions  of  Theorem  2  are  satisfied. 


Therefore,  one  can  construct  non-tensor  product  wavelet  frames  from  the 
tensor  product  frames.  In  the  case  of  Riesz  bases,  similar  results  are  obtained  in 
[5]. 


The  1-D  wavelet  function  could  be  the  Mexican  hat,  a  combination  of  a  few 
sigmoids  (e.g.[23])  etc.  The  choice  of  the  wavelet  used  in  networks  for  learning  is 
dictated  by  considerations  of  smoothness,  implementability  in  analog  hardware, 
separability,  etc.  Some  of  these  issues  will  be  considered  in  chapter  3. 

Radial  Wavelet  Frames 

When  we  impose  radial  symmetry  on  the  mother  wavelet,  4>{oj)  =  <KIM|)  the 
following  isotropic  admissibility  condition  results, 

Jo  yl<KMI2  <  °°- 

For  instance,  the  radial  Mexican  hat  function  4>(x)  =  (n  —  \\x\\2)e  2  is  used 
by  many  researchers  when  continuous  wavelet  transforms  are  considered,  and  in 
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particular  the  difference  of  Gaussians  as  approximation  to  the  Laplacian  of  the 
Gaussian  is  popular  in  computer  vision  applications.  In  the  discrete  case,  such 
a  radial  construction  is  implied  in  the  conditions  of  Theorem  1.  It  is  easy  to 
see  that  the  first  two  conditions  follow  directly.  Moreover,  if  the  1-dimensional 
mother  wavelet  is  chosen  according  to  the  mild  decay  conditions  (Daubechies  [8]), 
i.e., 

|  <  Ci\u)i\a(l  +  \u}i\2)~2  with  a  >  0  and  7  >  a  +  1 

then  the  third  condition  of  Theorem  1  is  also  satisfied.  Therefore,  the  con¬ 
struction  involves  the  following: 

1.  Select  a  symmetric  1-D  mother  wavelet  4>(x)  and  calculate  the  Fourier 
Transform 

2.  Let  the  multi-dimensional  wavelet  satisfy 

=  0(IMI). 

The  Inverse  Fourier  Transform  of  (j>{u)  gives  the  radial  mother  wavelet 
candidate  for  higher  dimensions. 


In  the  sequel,  we  shall  be  concerned  with  tensor-product  wavelets.  Once  a 
frame  is  selected,  for  any  /  €  L2(Rn)  we  can  write, 


f  —  Cmn'lpmn 

m,n 

where 

ipmn(%)  =  detD1/2ij:(Dx  —  Tb ), 
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D  =  diag(a[l]m,---,a{N}m) 


and  T  —  diag(n[l],  •  •  •  ,n[N}).  The  coefficients  cmn  represent  local  information 
at  the  space-frequency  location  of  m,  n.  Therefore  it  is  desirable  to  define  the 
centers  of  time-frequency  in  a  rigorous  fashion. 

The  following  definitions  for  the  n-dimensional  case  are  appropriate. 


xi  -  ■  ■xn\f(x)\dx 


wc(|/|)  =  — ^ —  f  lo\  ■  ■  -o;„|/(a;)|2du; 

||/||2  J[0,oo)n 

For  the  tensor  product  wavelet,  the  center  co-ordinates  are  the  centers  in 
each  dimension. 

In  practice,  functions  are  essentially  concentrated  in  a  spatio-spectral  region 
in  the  following  sense. 


(2.6) 


(2.7) 


Hence  the  set  of  (m,  n )  is  truncated  to  a  finite  index  set  X  and  we  need  to 
be  precise  about  the  truncation  error  in  truncating  with  finite  (m,  n) .  We  work 
with  the  tensor  product  construction.  We  note  again  the  abuse  of  notation  in 
using  inequalities  and  bounds  in  the  vector  case,  i.e.,  since  x  G  HN ,u  G  R^,  the 
vector  inequalties  involving  x  and  u ;  are  taken  elementwise.  First  we  note  that 
cmn  =<  /,  >,  where  S  is  the  operator  in  connection  with  frames  defined 

earlier.  S^'ipmn  is  called  the  dual  frame  tpmn.  Hence 

i  |  f  ^  ^  ^  f  j  'ipmn  ^  "0mn  1 1  =  SUp  j  <i  f ,  h  ^  ^  f  i  'tpmn  ^  "0mra ;  h  > 

m,n(E.T  ll^ll=l  m,n£T 
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SUp  |  'y  ]  <  f ,  >  <  ifimn  j  h  >  | 

li^ll=l  m,n#I 

<  sup  53  53  (•) 

IM|— 1  neZn 

+  sup  53  53  (•) 

l!^H—1  1 'ni<m<mu  a~Tnxu+x^<nb<a~nlxi—x'^ 

Here  x^,xj^  are  used  to  consider  a  small  region  just  beyond  the  boundaries 
in  spatial  region  defined  by  the  set  X. 

Define  a  cover  Be  to  X  as 

(m,  n)  6  Zn;  mi  <  m  <  mu,  and 

(uJl  i  (jJu  ,  Xi ,  Xu  )  —  ” 

amxi  -  xl(t,mi,mu)  <  nb  <  a~mxu  +  x^(e,mi,mu). 

The  notion  here  is  that  there  exist  lower  bounds  on  x^,x^,mu  and  upper 
bound  on  mi  to  meet  the  essential  spatio-spectral  concentration  of  the  function 
to  the  region  [xi,xu]  x  [wi,wu\.  Using  the  Cauchy-Schwarz  inequality,  we  can 
derive  the  following,  paralleling  the  one-dimensional  case  studied  in  [8].  Coarse 
estimates  for  mui,  mu,  x^,  xJl  that  depend  on  the  decay  factor  of  the  mother 
wavelet  and  the  spetral  limits  are  used  in  this  derivation  to  show  the  desired 
results. 

{S^[cjhuu]Mf(u)\2)2 

II/-  23  <  / ,  i’mn  >  VVnnll  <  \J B / A  +  {jx?[Xl,xu]  dx\f{x)  |2)  2 

m,neBe(u>i,wu',xi,xu) 

_  +e||/|| 

If  we  use  the  definitions  in  2.6,  2.7,  with  the  same  e,  we  have  that 

||/-  5Z  Cmn'4’mn\\  =  O(e), 
m,n£Bc 

which  is  the  desired  result  consistent  with  the  intuition  of  essential  spatio- 
spectral  concentration,  i.e.,  it  suffices  to  use  the  nodes  that  fall  close  to  the 
spatio-spectral  region. 
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The  number  of  nodes  required  is  given  by 


N  TYli  Ttliu  s-*  771  j  ^  |  fi n* , .  nr* 

—  JJ  ^2  j a  Xui  _  _M  j  _|_  | u  xh 

i=l  ml=mii  b  b 

N  ( nmui-mn  _  1  \ 

=  II  6_1  (xiu  -  Xu)  amii  ^ - ~ _  - - j 

neglecting  effects  of  xft,  xft  ■ 


This  detailed  theory  provides  the  means  for  uniformly  approximating  any 
function  /  €  L2(RN)  to  a  desired  degree  of  accuracy,  and  translates  into  a 
network  formulation.  Such  a  formulation  is  studied  in  the  following  chapter. 
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Chapter  3 


Wavelet-based  Networks  and  Learning 


The  methodology  for  learning  involves  training  a  network  either  on-line  or  off¬ 
line  based  on  a  set  of  observations,  so  that  the  resulting  error  in  approximation  is 
within  acceptable  limits.  Such  networks  should  be  able  to  ‘generalize’,  meaning 
that  when  presented  with  input  not  used  in  training,  the  network  should  be 
able  to  approximate  the  mapping  well.  This  ability  is  the  key  to  learning,  but 
achieving  this  is  the  most  difficult  part  of  the  learning  process  since  the  training 
set  used  cannot  in  general  adequately  represent  the  whole  input  space.  We  will 
formalize  these  notions  and  consider  the  issues  arising  out  of  this  dilemma. 

3.1  Generalization  Error,  Network  Structure  and  the 
Size  of  the  Training  Set 

The  generalization  error  is  defined  as 

11/  /net  1 1  =  tgen 

where  /  is  the  desired  mapping  Rn  i-»  R,  and  fnet  is  the  mapping  provided 
by  the  network.  Now,  we  note  that  the  learning  process  gives  rise  to  two  distinct 
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types  of  errors,  viz. 


1.  The  approximation  error ,  /  —  f approx,  which  results  from  the  fact  that  a 
finite  amount  of  resources  (nodes  or  neurons)  are  used  to  approximate  the 
function.  This  tells  us  that  approximation  theory  can  be  used  as  a  means 
of  determining  the  size  of  the  network  to  be  used. 

2.  The  estimation  error,  f approx  ~  fnet,  which  results  from  the  fact  that  the 
co-efficients  or  weights  are  estimated  from  a  finite  amount  of  data.  Thus 
the  nature  and  size  of  the  training  set  used  has  to  come  from  statistical 
considerations. 


Thus  one  can  write, 


tgen  ^  11/  fapprox  |  T  ||  fapprox  /netll 


=  e, 


approx 


+  test 


(3.1) 

(3.2) 


However,  in  this  thesis  we  will  not  obtain  statistical  bounds  on  these  errors. 
Recent  papers  by  Barron  [2],  Niyogi  and  Girosi  [22]  provide  good  analyses  in 
similar  contexts  in  neural  networks,  and  it  should  be  possible  to  perform  simi¬ 
lar  analyses  in  our  case.  It  suffices  to  note  that  motivations  for  these  analyses 
come  from  fundamental  problems  in  “learning”,  viz.,  how  to  trade  off  the  above 
two  errors,  i.e. ,  eapProx  and  eest  so  that  the  resulting  error  tgen  is  within  accept¬ 
able  limits.  These  questions  are  related  to  determining  the  network  complexity 
(i.e.,  network  size)  and  the  size  of  the  training  set  (sample  complexity)  that  are 
optimal.  Results  reported  in  the  literature  agree  with  the  empirical  evidence 
that  several  combinations  of  these  two  parameters  optimally  result  in  the  same 
generalization  error.  As  a  result,  the  choices  are  in  practice  determined  by  the 
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availability  of  resources-  for  network  units  and  for  data  collection.  In  other 
words,  if  a  large  amount  of  data  can  be  obtained,  it  is  possible  to  use  this  high 
“information  content”  or  “feature  content”  to  obtain  more  compact  networks. 
Conversely,  when  only  a  small  amount  of  data  is  available,  it  is  possible  to  arrive 
at  the  same  generalization  error  by  using  a  larger  network.  Again,  this  assertion 
is  supported  by  theoretical  results  as  well  as  empirical  observations  (simulations) . 

This  discussion  suggests  one  among  several  important  reasons  to  develop 
on-line  sequential  learning  strategies  that  build  near-optimal  networks  provided 
sufficiently  large  amount  of  data.  Because  the  data  are  presented  only  once  1, 
there  is  no  need  to  store  a  large  amount  of  data.  Other  reasons  will  become 
apparent  in  later  sections  of  this  chapter. 

3.2  Local  versus  Global  Learning 

Learning  schemes  based  on  ‘global’  and  ‘local’  learning  schemes  have  distinct 
properties,  most  of  which  are  well  known  in  the  literature.  In  global  learning, 
no  association  can  be  made  between  a  subset  of  the  input  space  and  the  ad¬ 
justable  elements  (weights).  At  every  instant  of  weight  adjustment,  all  weights 
get  adjusted.  This  has  the  advantage  of  resulting  in  a  compact  network  and  bet¬ 
ter  generalization,  but  one  has  to  contend  with  poor  accuracy  and  sensitivity. 
The  multi-layer  sigmoidal  networks  in  widespread  use  are  global  learning  net¬ 
works.  In  contrast,  local  learning  is  characterized  by  weights  corresponding  to 
a  small  region  of  the  input  space,  a  higher  degree  of  accuracy,  a  smaller  number 

1  repeated  presentation  may  be  necessary  when  the  same  strategies  are  used  off-line,  but  a 
small  amount  of  data  is  sufficient  in  this  case 
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of  weight  adjustments,  etc  on  the  positive  side  and  a  larger  number  of  units, 
poor  generalization  capability  because  of  too  close  fitting,  etc,  on  the  negative 
side.  However,  there  are  certain  applications,  where  a  local  mapping  is  highly 
desirable.  When  training  data  are  obtained  from  on-line  interactions  with  the 
system  to  be  modelled,  the  training  samples  may  tend  to  fixate  to  a  certain 
region  of  the  input  space.  This  can  harm  the  generalization  capability  in  global 
learning  since  all  weights  are  repeatedly  adjusted  (see,  for  instance,  [10]  for  a  dis¬ 
cussion  of  some  related  issues  in  learning  control  applications).  In  certain  cases, 
accuracy  is  the  predominant  concern  and  the  requirement  for  larger  memory  is 
acceptable.  In  other  cases  where  local  learning  is  essential,  but  large  memory 
cannot  be  accommodated,  techniques  for  reducing  the  ‘curse  of  dimensionality’ 
need  to  be  developed.  The  Gaussain  Radial  Basis  Function  networks,  and  the 
Wavelet  Networks  (WNs)  used  in  this  work  are  local  learning  networks. 


3.2.1  Training  Local  Learning  Networks 

A  claim  is  often  made  that  the  linear-in-the  parameter  structure  of  local  learning 
networks  (such  as  the  GRBFNs  and  WNs)  simplifies  training  compared  to  the 
extremely  slow  back-propagation  procedure  used  in  sigmoidal  neural-networks. 
Such  simplicity  is  deceptive  unless  techniques  that  address  the  issue  of  how  to 
select  the  ‘basis’  or  ’receptive  field’  functions  are  developed.  In  many  problems, 
this  can  again  necessitate  gradient  based  nonlinear  optimization  procedures.  In 
the  wavelet  frame  work,  this  problem  comes  down  to  selecting  the  appropriate 
sets  of  dilations  and  translations  (also  referred  to  as  the  dictionary)  in  an  efficient 
manner.  Once  this  is  done,  training  is  of  course  easier  than  back-propagation, 
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and  several  methods  can  be  used  for  determining  the  coefficients  depending  on 
whether  the  training  is  on-line  or  off-line  and  the  amount  of  computations.  In 
the  next  few  sections  of  this  chapter  we  give  a  details  of  our  methodology. 

3.2.2  Theoretical  Difficulties  of  Network  Construction 

Existing  methodologies  to  construct  sigmoidal  networks  are  lacking  in  several 
respects:  to  rephrase  the  discussion  at  the  start  of  the  chapter,  for  a  speci¬ 
fied  tolerance,  questions  such  as  how  many  nodes  are  necessary  in  a  hidden 
layer?,  how  many  layers  are  necessary?,  and  how  many  training  samples  are 
needed?,  have  only  partial  answers  at  the  present  time.  Several  researchers  have 
recently  shown  that  feed-forward  neural  networks  are  universal  approximators 
(see,  e.g.,[16]),  and  that  a  single  hidden  layer  is  sufficient  to  approximate  any 
arbitrary  nonlinear  function  to  a  desired  accuracy  provided  a  sufficient  number 
of  neurons  are  used;  however  sufficiency  does  no  lead  to  any  rigorous  procedures 
for  network  construction. 

Why  use  wavelets  ? 

Pati  and  Krishnaprasad  [23]  circumvented  these  questions  to  some  extent  by 
using  discrete  wavelet  transforms  in  place  of  sigmoids;  however,  their  work  is 
primarily  applicable  to  1-D  problems;  adequate  treatment  of  multi-dimensional 
wavelet  theory,  and  problems  faced  due  to  the  ‘curse  of  dimensionality’  are  not 
available. 

One  advantage  with  wavelets  is  that  the  problems  of  input  pre-processing, 
scaling  etc,  are  avoided  (inherently  structured  in  this  way  ).  Similarity  of 
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wavelets  to  Radial  Gaussians  also  makes  wavelets  attractive  in  applications  where 
Gaussians  have  been  successfully  employed.  Indeed,  wavelet  theory  provides  a 
natural  basis  for  multi- resolution  hierarchical  schemes,  unlike  the  artificially  im¬ 
posed  multi-resolution  hierarchies  in  the  case  of  Gaussians  as  used  in  vision 
applications.  Such  a  multi-resolution  scheme  is  intuitively  appealing  since  they 
possess  the  ability  to  zoom-in  on  areas  of  high-frequency  concentration,  anal¬ 
ogous  to  the  way  the  human  brain  processes  information.  Moreover,  because 
of  the  above  structure,  wavelets  offer  the  possibility  of  obtaining  more  compact 
networks  though  this  is  not  a  universally  applicable  claim. 


3.2.3  Problems  Faced  in  High  Dimensions 

Many  wavelet/neural  learning  problems  often  attacked  in  the  literature  concern 
one-dimensional  applications,  with  no  straightforward  extension  to  high  dimen¬ 
sional  cases.  Such  studies  have  very  little  utility  in  practical  problems  since 
neural  networks  find  their  usefulness  predominantly  in  high-dimensional  appli¬ 
cations.  High-dimensional  problems  pose  more  challenges  and  much  remains  to 
be  done  in  the  direction  of  achieving  good  generalization  at  significantly  reduced 
complexity.  If  on-line  sequential  adaptation  is  attempted  in  a  high  dimensional 
setting,  the  problems  become  still  more  complicated.  In  particular,  traditional 
methods  would  require  storage  of  large  amounts  of  past  data,  which  is  difficult, 
and  moreover  the  function  to  be  learnt  can  be  highly  non-stationary.  Information 
theoretic  considerations  such  as  Cross- Validation/ Generalized  Cross  Validation, 
Minimum  Description  Length  Principle  or  Akaike’s  Information  Criteria  are  not 
applicable  sequentially  on-line  because  of  the  repetitive  nature  of  these  nonlinear 
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optimization  procedures. 


3.2.4  Training  Local  Learning  Networks 

A  claim  is  often  made  that  the  linear-in-the  parameter  structure  of  local  learn¬ 
ing  networks  (such  as  the  GRBFNs  and  Wavelet  Networks)  simplifies  training 
compared  to  the  extremely  slow  back-propagation  procedure  used  in  sigmoidal 
neural  networks.  Such  simplicity  is  deceptive  unless  techniques  that  address  the 
issue  of  how  to  select  the  ‘basis’  or  ‘receptive  field’  functions  are  developed.  In 
many  problems,  this  can  again  necessitate  gradient  based  non-linear  optimiza¬ 
tion  procedures.  In  the  wavelet  frame  work  this  problem  comes  down  to  selecting 
the  appropriate  sets  of  dilations  and  translations  (also  referred  to  as  the  dictio¬ 
nary)  in  an  efficient  manner.  Once  this  is  done,  several  methods  can  be  used 
for  determining  the  coefficients  depending  on  whether  the  training  is  on-line  or 
off-line  and  the  amount  of  computations.  In  the  next  few  sections  of  this  chapter 
we  give  details  of  our  methodology. 

We  have  shown  in  chapter  2,  how  to  construct  multi-dimensional  wavelet 
frames,  and  how  these  frames  can  be  used  to  approximate  functions  in  high¬ 
dimensional  spaces.  This  theory  maps  directly  into  a  network  configuration. 

3.2.5  Determining  the  Spatio-Spectral  Centers  and  the 
Coefficients 

The  first  issue  in  the  wavelet  network  construction  is  to  determine  the  frequency 
content  of  the  system  function,  which  is  assumed  to  be  essentially  band-limited 


24 


£ 

w 

c 

0 

+3 

6 


-3-2-1  0  1  2  3 

Translations  ,  n 

Figure  3.1:  Spatio-Spectral  Distribution 

in  space  and  frequency  in  x  [xi,xm].  If  somehow  this  information  is 

known  a  priori  the  space-frequenecy  region  on  which  the  approximation  is  to 
be  attempted  becomes  clear.  Theoretically,  one  would  then  try  to  select  all  the 
spatio-spectral  centers  that  fall  within  the  region  of  interest.  The  computation  of 
coefficients  would  be  straightforward  theoretically.  However,  the  number  of  such 
centers  increases  enormously  with  each  additional  dimension,  and  this  would 
make  the  methodology  devoid  of  any  practical  merits.  Several  approaches  can 
be  followed  to  give  realistic  solutions  depending  on  the  nature  of  the  problem  in 
hand: 

1.  The  dimension  of  the  input  space 

2.  The  amount  of  data  and  whether  on-line  or  off-line 
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Receptive  Fields  for  Tensor  Product  wavelet  Receptive  Fiieds  for  Radial  wavelets 


Figure  3.2:  Typical  Receptive  Fields  in  Two  Dimensions 

3.  The  degree  of  approximation  desirable 

4.  A  priori  knowledge  of  smoothness  information  if  any. 

In  practice  attempts  at  calculating  the  spectral  content  based  on  the  training 
data  are  bound  to  be  fruitless  unless  one  considers  low  dimensions  and  a  small 
set  of  training  data.  The  methodology  we  propose  in  the  next  section  obviates 
the  need  to  perform  such  calculations  of  spectral  content.  We  assume  that  there 
is  no  significant  constraint  on  the  amount  of  data  that  can  be  gathered  on-line. 


3.3  A  Heuristic  Methodology  for  Dynamic  Selection 


Consider  each  dimension  separately;  the  same  procedure  will  be  performed  in 
each  dimension.  Let  L(l) ,  —  1,  •  •  • ,  n  be  the  low  and  high  frequency  limits. 

Choose  L(l )  corresponding  to  the  case  in  which  entire  spatial  region  [xi(l),xm(l)] 
is  covered  by  two  nodes  (translations),  i.e., 

T  (l\  _  \og{xm{l)-xi{l))-log(b{l )) 

^ L>  ~  log{a) 

There  is  no  need  to  know  M(l )  except  that  knowledge  of  M(l )  can  provide  a 
rough  upper  bound  on  the  number  of  dilation  levels  to  be  used.  But  this  is  not 
essential  to  the  method. 
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It  is  now  possible  to  build  on  successive  levels  of  dilations  on-line.  We  observe 
that  in  figure  3.1,  the  width  between  adjacent  nodes  for  a  dilation  level  m  is 
given  by  a~mb.  This  information  can  be  used  with  information  on  the  nearest 
neighbour  nodes,  to  develop  the  following  strategy.  Theoretical  justification  will 
be  given  in  the  next  section. 

•  Initialize: 


m(l)  =  L(l) 
d(l)  = 

Set  first  node  (translation  value)  to  x<^(0  rounded  to  the  nearest  integer. 


•  begin  on-line;  determine  nearest(l)  (The  distance  to  the  nearest  existing 
translation  node  ). 

If  nearest(l)  >  d(l )2  and  | ynet  —  y\  >  e  add  a  new  translation  node  at 

n\  x(l) aL{l)  j  j 
n{l)  —  —  i77\ —  rounded. 
b{l) 

•  Select  the  current  weight  for  the  new  node  as  ccurrent  =  where 

p{x)  =  -01  •  •  •  4>n,  calculated  at  spatio-spectral  locations  m(l),n(l). 


If  a  new  node  is  not  selected  for  the  present  data,  adapt  the  coefficients  using 
the  LMS3  [32]  algorithm. 

When  the  network  has  learned  sufficiently  at  this  level  (this  is  determined  by 

2  Considerations  on  this  choice  are  detailed  in  the  next  section 

3  Other  algorithms  such  as  RLS-Kalman  or  variants  thereof  can  also  be  used  at  the  expense 
of  increased  complexity  of  implementation 
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the  fact  that  no  new  node  is  added  for  a  sufficient  number  of  continuous  data 
points,  as  shown  by  a  flag)  set  m(l)  =  m(l)  +  1; 

Continue  this  on-line.  After  a  sufficient  number  of  nodes  are  learnt,  the  algorithm 
automatically  stops  adding  new  units. 

If  training  is  desired  off-line,  optimization  of  this  initial  model  can  be  per¬ 
formed  based  on  orthogonal  least  squares  (OLS)  [4]  or  the  orthogonal  matching 
pursuit  (OMP)  [24],  which  is  similar  to  OLS. 

3.4  Justification 

The  methodology  proposed  above  can  be  justified  based  on  geometric  model 
growth.  A  work  in  similar  spirit  for  Platt’s  RAN  can  be  found  in  [17]. 


Figure  3.3:  The  Geometric  Picture 

The  geometric  picture  is  illustrated  in  figure  3.3.  Notice  that  fn  —  £”=i 
implies  that 


fn  =  fn- 1  +  e„  with  en  =  cnifn 

The  best  approximation  fpTojn-i  to  /  that  one  can  get  from  a  set  of  frames 
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l^n  | 

|sin(o:n)| 

|e7I||sin(o:n)| 

Decision 

Interpretation 

< 

<  ^2 

<  ^1^2 

Use  LMS 

The  Projn.  is  sufficient. 

<  Ci 

>  £2 

<  ei 

Use  LMS 

The  Projn.  is  sufficient 

>  <u 

<  e2 

? 

Use  LMS, flag 

need  more  data  4 

>  Cl 

>  ^2 

>  ^1^2 

Add  a  new  node 

The  Projn.  is  inadequate 

Table  3.1:  Possible  Combinations  and  Actions 


{ipi,  i  =  1,  •  •  • ,  n  —  1}  is  the  projection  of  /  onto  the  space  spanned  by  the  set 
{4>i,i  =  l,*--,n  —  1}.  Our  first  problem  is  to  decide  whether  a  new  ‘basis 
unit’  needs  to  be  added  at  this  stage.  Such  a  decision  obviously  can  be  based  on 
whether  the  projection  onto  the  span  is  inadequate,  i.e.,  whether  \\fn—fProjn-i  ||  = 
|en|  sin  (an)  exceeds  an  allowable  threshold  e.  Here  \en\  =  ||/n  —  /n_i||. 

Table  3.1  shows  the  four  possibilities.  Only  the  fourth  case  warrants  addition 
of  a  new  ‘unit’,  while  the  third  case  suggests  via  a  flag  that  new  dilation  levels 
may  be  needed  at  the  particular  locality,  or  at  least  more  data  are  needed. 
Therefore  we  can  separate  the  condition  in  terms  of  \en\  and  |  sin  (an)  |.  Now  it 
is  difficult  to  calculate  an.  However,  notice  that  e„cos  (an)  lies  in  the  span  of 
{-0!,  •  •  • ,  0n_i}.  Therefore  the  condition  on  sin  (an)  can  be  recast  as 

supi= i,-,w— 1{|^  jT|}  <  1  -  6  where  6  <  1  . 

3.4.1  Interpreting  the  Condition  on  <  > 

For  this  calculation  we  make  a  choice  for  the  wavelet  with  the  understanding 

4In  addition  to  finding  the  projection,  the  flag  indicates  that  at  this  local  region  additional 
nodes  may  be  required 
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that  the  same  method  can  be  used  for  other  wavelet  functions.  We  consider  the 

x2 

Mexican  hat  (1  —  x2)  e~~ .  The  tensor-product  of  this  wavelet  in  two  dimensions 
is  shown  in  figure  3.4. 

Mexican  Hat  Wavelet  by  Tensor  Multiplication 


Figure  3.4:  The  Tensor  Product  of  Mexican  Hat 
Also  we  note  that  our  tensor  product  results  in 

<  l/}n,  i>i  >  =  <  >  ■  ■  <  VVW,  An  >  ■ 
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Therefore  by  calculations  shown  in  Appendix  A. 3  we  arrive  at  the  following 
reduction.  Vj  G  {1,  •  •  • ,  n  —  1}, 

a2md2.  /  1  \ 

<  </> >=  r1  (l.O  -  a2md*  + 

where 

dj  •Ejw 

It  is  interesting  to  note  that  graphically  <  ipn,ipi  >  takes  the  form  of  the 
mother  wavelet  and  that  cos (anj)  can  take  negative  values  (i.e.,  the  angle  an 
can  be  greater  than  7r/2  ).  In  the  case  of  Gaussians  (GRBFNs),  we  can  easily 
verify  that  the  form  of  this  product  is  again  Gaussian,  and  that  0  <  an  <  90°. 
This  shows  clearly  the  differences  between  the  Gaussian  and  the  Wavelet  cases. 
An  important  property  emerges  here:  if  we  maintain  the  distance  d  at  zero¬ 
crossing  points,  we  get  an  =  90°:  orthogonality  between  nodes.  However  notice 
that  this  orthogonality  doesn’t  hold  across  all  multiples  of  the  distance  since  the 
function  has  only  four  (symmetric)  zero-crossings  in  each  dimension  that  are  not 
integer  multiples  and  hence  cannot  generate  a  regular  lattice.  Also  notice  that 
between  the  first  and  second  zero-crossings,  the  absolute  value  of  cos(o!n)  can 
be  high.  Since  near-orthogonality  is  desirable,  the  fore-mentioned  observations 
suggest  that  one  can  choose  the  distance  either  to  co-incide  with  any  of  the 
zero-crossings  or  to  be  around  them.  In  the  case  where  a  condition  of  the  form 
|d|  >  d0(m)  where  do(m)  is  a  fixed  distance  for  a  given  m,  is  desired  it  is  possible 
to  attempt  do(m)  >  the  distance  to  the  second  zero-crossing. 

In  the  case  of  Gaussians,  such  orthogonality  between  any  nodes  obviously 
cannot  exist.  Choice  of  distance  for  near  orthogonality  is  possible.  It  has  been 
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brought  to  our  notice  that  in  a  different  context,  Holcomb  and  Morari  [15]  sug¬ 
gested  some  ad  hoc  procedures  in  related  spirit  to  this  work  for  forcing  orthog¬ 
onality  by  using  particular  penalty  terms  that  help  spread  out  the  basis  centers 
in  RBFN  learning. 

The  value  of  cos(.)  as  a  function  of  d,  for  dilations  0  and  -1 


Figure  3.5:  The  Variation  of  cos(o:„)  with  d,  in  2D 

Figure  3.5  shows  the  value  of  cos(an),  which  results  from  the  multiplication 
of  the  values  using  separate  dilations  in  each  dimension  for  the  2-Dimensional 
case. 

3.5  Implementation  Issues 

An  important  consideration  in  implementation  is  separability.  Separability  of 
the  wavelet  makes  the  network  more  amenable  to  parallel  hardware  implemen¬ 
tation.  Since  sigmoids  are  more  easily  implemented  than  Gaussians  or  Mexican 
Hat,  wavelet  frames  can  be  constructed  by  superposition  of  sigmoids.  Pati  and 
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Figure  3.6:  The  Wavelet  Network 

Krishnaprasad  [23]  give  more  details  on  numerical  procedures  for  constructing 
frames  from  sigmoids.  Separability  can  also  be  used  to  advantage  in  constructing 
efficient  algorithms  such  as  the  LMS  Tree  (see  for  instance  [29])  under  certain 
restrictive  conditions.  Since  we  are  interested  in  more  general  cases,  we  choose 
to  implement  the  sequential  learning  strategy  we  develop.  We  may  note  that 
the  Gaussian  RBF  networks  are  separable  and  they  allow  the  addition  of  succes¬ 
sive  dimensions  consistent  with  biological  observations [27].  However,  as  we  have 
shown  in  chapter  2,  Radial  Wavelets  are  not  in  general  known  to  be  separable 
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frames. 

Choice  of  the  wavelet  should  be  based  on  the  expected  structure  of  the  func¬ 
tion.  Smoothness  and  symmetry  should  be  considered.  For  logical  functions 
that  involve  switching  between  logic  states,  the  Haar  Basis,  which  happens  to 
be  orthonormal  as  well,  though  not  symmetric,  is  a  sensible  choice.  Many  other 
orthonormal  wavelet  frames  have  complicated  expressions,  and  large  supports. 
They  do  not  lend  themselves  readily  to  hardware  implementation,  and  large  re¬ 
ceptive  fields  could  be  detrimental  to  attempting  local  learning. 

Because  we  assume  piece-wise  smooth  nonlinear  functions  with  high  spatial 
variability,  we  choose  to  experiment  with  the  Mexican  Hat  in  a  tensor  product 
form.  Although  using  single-scaling  wavelets  in  radial  form  can  simplify  the 
learning  and  reduce  the  number  of  network  nodes,  we  would  like  to  test  the 
effectiveness  of  our  methods  under  more  general  conditions. 

For  the  Mexican  Hat  wavelet,  from  numerical  calculations  given  in  Daubechies 
[8],  for  the  dilation  a  =  2,  values  of  b  >  0  which  satisfy  the  frame  theorems  given 
in  chapter  3  are  selected  to  be  b  G  (0.25, 1.875).  Since  we  require  that  the  redun¬ 
dancy  in  frames  be  kept  to  a  minimum,  it  is  important  to  keep  the  ratio  of  frame 
bounds  B/A  as  close  to  one  as  possible.  This  factor,  along  with  the  number  of 
dilation-translation  nodes  that  can  be  allocated  will  influence  the  choice  of  b. 
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Chapter  4 


Connection  to  Existing  Methods 

4.1  Regularization  Theory 

In  this  section  we  discuss  regularization  theory,  and  show  how  symmetric  wavelets 
can  be  derived  from  the  regularization  framework  of  Poggio  and  Girosi  [27]. 

Given  a  set  of  data  S  =  {(re*,  y»)  €  Rn  x  R  \  i  =  1,  •  •  • ,  J},  regularization 
theory  gives  the  function  /  that  minimizes  a  functional  of  the  form 

i— 1 

Where  A  is  a  positive  real  parameter  called  the  regularization  parameter,  and 
P  is  an  operator  that  captures  whatever  prior  information  on  the  smoothness 
of  the  function  /  is  available.  Using  the  Euler-Lagrange  equations  associated  to 
the  above  problem,  one  obtains 

j 

fix)  =  ]> ~2ciG{x-,Xi )  +p(x) 

i= 1 

where  the  term  p(x)  is  a  linear  combination  of  functions  spanning  the  null  space 
of  P,  and  arises  because  terms  in  the  null  space  of  P  are  invisible  under  the 
minimization  of  H(f),  and  G(x )  is  a  Green’s  function  of  the  operator  P^P,  with 
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pt  as  the  adjoint  of  P,  and  c;  is  given  by 

_ (yi  — 

A 

Without  looking  for  complicated  operators  one  can  think  of  the  operation  as 
a  linear  filtering  operation  which  suppresses  components  in  unwanted  frequency 
bands.  This  fact  is  recognized  in  related  literature  (see,  e.g.,[14]).  For  example, 
in  order  to  arrive  at  Gaussian  Radial  Basis  Functions(GRBF),  one  can  consider 

Hull2 

the  operation  as  passing  /  through  a  High-Pass  Filter  given  by  e  «•  .  As  we  will 
see  later,  the  filtering  function  corresponds  to  A. 

Since  we  are  interested  in  obtaining  wavelets  as  function  approximators,  it  is 
relevant  to  consider  whether  wavelets  can  be  derived  within  the  regularization 
frame  work.  There  are  two  motivations  for  this.  One  is  that  wavelet  theory 
provides  functions  that  range  from  orthonormal  bases  to  frames  and  provides 
flexibility  of  choice  with  possible  algorithmic  improvements.  The  other  is  that 
although  wavelet  theory  is  now  well-developed  from  the  point  of  view  of  func¬ 
tional  analysis  and  approximation  theory,  in  many  applications  one  confronts 
the  problem  of  fitting  a  wavelet-based  approximator  on-line  or  off-line  to  an 
unknown  system  described  only  by  the  set  of  input-output  data  obtained  from 
observations.  In  such  problems,  the  relation  between  the  number  of  data  avail¬ 
able  and  the  number  of  parameters  to  be  chosen  in  the  wavelet  approximator, 
the  choice  of  the  parameters  either  on-line  or  off-line  etc.,  are  not  trivial  issues. 
This  is  particularly  appealing  when  the  regular  lattice  structure  used  in  wavelet 
theory  causes  an  excessive  number  of  wavelet  units.  Thus  embedding  wavelet 
theory  in  a  regularization-statistical  frame  work  is  desirable. 

First  we  will  consider  whether  a  mother  wavelet  can  be  derived  from  this 
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theory.  The  strong  admissibility  condition  of  a  mother  wavelet  (j)  e  R  is 


f  (f>{x)dx  =  0. 

JR 


We  have  also  shown  in  chapter  2  how  to  construct  mother  wavelets  in  R"  from  a 
mother  wavelet  in  R.  This  suggests  that  we  have  to  impose  additional  assump¬ 
tions  on  P  in  the  form  of  minimizing  not  only  the  high  frequency  energy,  but  also 
the  energy  in  a  small  region  ( Be  =  {a/|  -  e  <  \\lo\\  <  +e}  )  near  zero-frequency. 

One  can  include  a  weight  for  the  constant  term  and  a  polynomial  p(x)  with¬ 
out  causing  problems  in  practice.  In  fact,  p(x)  =  J2j= 1  wjxj  is  sometimes  used 
to  capture  any  linear  dependencies  that  may  exist. 


For  derivation,  however,  we  look  for  a  Band-Pass  filtering  function  B(u)  that 
vanishes  at  co  =  0  and  approaches  zero  as  lo  — »  oo,  such  that  provides  the 
necessary  filtering  operation. 

Then  we  can  write  the  functional  as 

j 


H(f )  =  Y,  (y,  -  /  +  A  [  du, 

\  J  H'-  )  hV-  H, 


I/Ml2 

B(w) 


Minimizing  the  above  functional  with  respect  to  /  by  setting  the  functional 
derivative  to  0  under  the  limiting  operation  e  — >  0  results  in 


/>)  =  B(-u)  £ 

i= 1 


A 


The  above  result  shows  that  for  using  the  above  theory  in  a  general  way,  to 
approximate  a  function  /,  we  have  to  assume  a  symmetric  B(-)  to  have  B(u>)  is 
real.  Under  this  assumption,  the  Inverse  Fourier  Transform  gives 


f(x)  =  ciB(x  ~  xi )  +  P(x)- 


i— 1 
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Here  p(x)  and  c*  are  taken  as  defined  earlier.  As  in  the  case  of  Gaussian 
Radial  Basis  functions,  p(x)  is  unnecessary  since  the  null-space  of  the  filtering 
operation  is  empty. 

The  restriction  on  symmetry  rules  out  many  orthonormal  wavelet  bases  that 
are  known  in  the  literature.  The  Meyer  wavelet  and  the  Battle-Lemarie  wavelet 
family  appear  to  be  the  only  known  symmetric  orthonormal  wavelet  bases;  but 
it  is  also  known  that  these  wavelets  have  a  large  support;  several  compactly 
supported  orthonormal  wavelets  bases  are  known,  but  all  are  non-symmetric  [8]. 

Any  symmetric  wavelets  (either  frames  or  orthonormal  bases)  can  be  used. 
For  example,  taking  B( u>)  =  j|cc> j|2e— ^4^  results  in  the  Radial  Mexican  Hat: 

(«-  IM2)  e~J^L  ■ 

Tensor  product  of  any  symmetric  wavelet  can  also  be  derived  simply  by  consid¬ 
ering  B(u)  =  Bi(ui)  ■  •  -Bn(u)n),  where  B{oOi)  are  the  1-D  Band-Pass  Filters  for 
i  =  1,  •  •  •  ,n. 

So  far,  we  have  considered  the  derivation  of  mother  wavelets  only.  This  results 
in  an  approximation  scheme  that  uses  translations  only  with  a  continuous  wavelet 
transform.  The  essence  of  wavelet  theory  lies  in  the  fact  that  it  provides  an 
elegant  tool  for  spatio-spectral  localization  using  dilations  and  translations.  We 
will  see  that  for  the  case  considered  above  (i.e. ,  symmetric  mother  wavelets),  a 
translation- dilation  structure  with  a  continuous  wavelet  transform  can  be  derived 
within  the  regularization  framework.  For  this  purpose,  we  combine  two  separate 
extensions  for  Hyper  Basis  Functions(HBF)  [26]. 

•  Assuming  that  the  function  is  to  have  several  levels  of  resolution,  i.e., 
f(x)  =  Y^m=i  fm{x).  Then,  one  can  consider  each  level  fm  at  a  dilation 
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level  am  in  the  forementioned  procedure  to  arrive  at 

M  J 

f(x)  =  ^2  cmiB(am(x  -  it)). 

m= 1  i= 1 

•  Weighted  Norm.  One  considers  the  weighted  function  B(\\x  —  Xi\\w)- 

Choose  W  —  diag  (aml,  •  •  • ,  amN)  .  This  norming  idea  arises  primarily  as 
a  means  of  taking  into  account  the  increased  degrees  of  freedom  that  can 
result  in  dimensionality  reduction. 

Thus  the  basic  approximation  scheme  can  be  advanced  without  the  rigorous 
conditions  of  chapter  2,  making  it  a  suitable  alternative  when  not  using  a  regular 
lattice.  It  is  important  to  note  that  while  the  original  form  of  regularization 
theory  fits  a  ‘basis’  corresponding  to  each  data  point,  practical  considerations 
require  that  approximate  techniques  be  used  to  select  a  fewer  number  of  basis 
elements  [27].  Notice  however  that  Regularization  does  not  and  cannot  address 
the  issue  of  sequential  learning. 

4.2  Radial  Basis  Functions 

Radial  Basis  Functions  provide  another  tool  for  function  approximation.  Al¬ 
though  there  exist  several  types  of  radial  basis  functions  that  can  be  used 
in  approximations,  the  Gaussian  Radial  Basis  Function  (GRBF)  of  the  form 

l|s-*J2 

G(x\Xi)  =  j^e  has  been  the  subject  of  much  attention  because  of  its 

many  desirable  properties  such  as  locality,  separability,  etc.  A  fixed  value  of  f3 
and  a  fixed  sampling  lattice  for  Xi  =  kA  can  be  used  to  study  the  approximation 
properties  given  by 

/  =  J2ckG{x-k). 

k 
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One  can  use  a  multi- resolution  scheme  in  which  different  levels  of  /?  are  used. 
These  functions  however  use  a  single  parameter  (3  to  control  all  dimensions  of  x 
and  are  analogous  to  radial  wavelets.  In  contrast,  our  focus  is  on  tensor  product 
wavelets.  Therefore  the  idea  of  using  weighted  norm  is  relevant  here.  The 
weighting  matrix  is  simply  a  diagonal  matrix  with  the  elements  corresponding 
to  different  dimensions,  i.e.,  the  weighting  matrix  is  diag(/?i,  •  •  • ,  j3n).  When  such 
schemes  are  used,  the  result  is  essentially  similar  to  our  wavelet  methodology. 
The  only  difference  is  that  such  structure  is  artificially  imposed  unlike  the  natural 
structure  provided  by  wavelet  theory,  and  hence  strategies  used  in  the  choice  of 
parameters  may  not  have  similar  mathematical  validity. 

4.3  Projection  Pursuit  Regression(PPR) 

Projection  Pursuit  [12]  is  an  statistical  technique  that  interprets  high-dimensional 
data  through  well-chosen  low-dimensional  projections.  In  PPR,  this  technique  is 
used  in  a  successive  refinement  approach  for  non-parametric  regression.  Consider 
the  single  output  case. 


/  =  fm  (aTmx^ 

m=  1 

with 

amX  =  J2amXr 
1=1 

Here  the  fm  are  single- valued  ridge  functions  of  a  single  variable.  The  param¬ 
eters  qT  as  well  as  the  functions  fm  are  chosen  to  simultaneously  minimize  the 
expected  error.  A  forward  stepwise  procedure  is  used  to  select  the  model  order 
M.  It  is  clear  such  a  strategy  has  many  similarities  to  the  wavelet  methodology 
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when  one  considers  the  analogy  between  the  m  above  and  the  dilation  m.  In¬ 
deed  it  is  this  observation  that  led  to  the  matching  pursuit  (MP)  algorithm  [18] 
and  later  the  orthogonal  matching  pursuit  (OMP)  [24],  which  orthogonalizes  the 
‘basis’  functions  at  each  stage,  just  as  the  Orthogonal  Least  Squares  (OLS)  [3] 
orthogonalizes  the  Least  Squares  (LS)  procedure. 

When  one  considers  off-line  fitting  of  a  model  based  on  the  complete  set  of 
observations  and  assumed  levels  of  frequency  content  (and  hence  the  dilations), 
one  can  construct  a  set  of  translation-dilation  indices  (the  dictionary)  from  the 
data  so  that  these  data  points  lie  in  the  ‘receptive  field’  of  the  elements  of  the 
dictionary.  Then  it  becomes  possible  to  apply  the  OMP,  and  it  is  indeed  the 
optimal  strategy,  but  it  will  require  more  computations  than  the  MP.  But  a 
central  concern  to  us  is  that  such  a  construction  of  dictionary  requires  so  much 
a  priori  information,  and  is  not  practicable  in  many  problems.  Furthermore  we 
place  emphasis  on  on-line  learning,  and  therefore  faster  methods.  Our  heuristic 
strategy  is  an  essential  tool  that  can  be  used  directly  to  fit  the  model  under 
this  situation.  In  off-line,  if  optimality  is  a  major  concern,  the  dictionary  au¬ 
tomatically  selected  in  this  procedure  can  be  further  optimized  by  using  OMP 
(normally  simple  strategies  such  as  removing  the  dictionary  elements  with  in¬ 
significant  weights,  also  called  ‘wavelet  shrinkage’  can  be  used). 
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Chapter  5 


Adaptive  Control  of  Nonlinear 
Systems 


We  consider  direct  adaptive  control  of  a  SISO  plant. 

5.1  Problem  Formulation 

Here  we  limit  our  analysis  to  the  class  of  non-linear  systems  that  has  a  well 
established  analytical  framework  in  nonlinear  control  theory,  namely,  systems 
that  have  a  canonical  structure  of  the  form: 

4n)  (t)  =  /(xi(t),ii(t),-**,a;iB-1)(t))  +$«(*). 

In  general,  g  =  g(x i,ii,  •  •  •,4"_1))- 

Assumptions: 

We  assume  that  f{x)  and  g{x )  are  sufficiently  smooth,  that  g~l(x )  exists  (or 
\g(x)\  >  bg  >  0)  and  is  smooth  in  the  region  of  our  interest;  the  assumption 
on  g~l{x)  implies  that  g(x)  is  of  the  same  sign  everywhere  in  the  region,  and 
without  loss  of  generality  we  can  take  it  as  positive.  It  is  also  assumed  that  \f(x)\ 
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and  \g(x)\  are  upper  bounded  in  the  region  of  interest  by  known  functions  Mf(x) 
and  Mg(x).  Measurability  of  the  state  vector  x,  and  Persistency  of  Excitation 
are  assumed  (this  latter  point  will  be  discussed  later).  It  is  emphasized  that 
unless  otherwise  stated,  no  other  information  about  the  functions  is  assumed. 

More  general  classes  of  systems  can  only  be  solved  under  certain  more  re¬ 
strictive  assumptions  or  on  an  ad  hoc  basis. 

It  is  well  known  that  many  practical  control  problems  such  as  robotic  ma¬ 
nipulator  control  can  be  reduced  to  the  above  canonical  form  (see,  e.g.,  [31]); 
neural  adaptive  control  schemes  for  plant  models  of  this  form  or  minor  variants 
thereof  have  been  studied  by  Chen  and  Khalil  [3]  using  Backpropagation  (Hy¬ 
perbolic  tangent  activation  functions)  neural  networks,  and  by  others  [30,  28] 
using  Gaussian  Radial  Basis  Functions. 

Let  x  =  (xi,xi,  •  •  •  ,x^n~^)T,  Xd  =  (xu,  •  •  • ,  be  the  state  and 

desired  vectors  respectively.  If  f{x)  and  g(x )  are  completely  known,  we  can 
consider 

u(t)  =  g~1{x1lld(t)  +  uvd(t)  -  f{x)),  (5.1) 

where  uPd(t)  =  —kTe(t),  with  e(t)  =  x  —  xd. 

The  problem  is  that  f(x)  and  g(x)  are  unknown  (except  for  the  assumptions 
made  earlier). 

Hence  we  have  to  make  suitable  approximations  to  the  unknown  functions 
through  interactions  with  the  plant.  It  is  in  this  context  that  neural  networks 
find  their  usefulness  in  control.  Wavelet-based  approximation  networks  are  yet 
another  way  of  providing  the  required  approximation  capability.  Let  /  and  g  be 
the  approximation  provided  by  such  a  network.  The  u(t)  in  5.1  is  redefined  as, 

u{t)  =  g(x)-1(xrld(t)  +  upd{t)  -  f{x)).  (5.2) 
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Upon  substituting  the  control  law  in  5.2,  the  error  equation  becomes, 


e  =  Ae  +  &(-/  +  /(as))  +  6(00*0  -  $(a;))u, 

where 


This  shows  that  if  f(x)  and  g(x)  continuously  track  the  unknown  functions 
f(x),  and  g{x)  respectively,  while  maintaining  boundedness  of  the  error  e  (and 
hence  boundedness  of  the  state  vector),  the  control  problem  is  solved  by  taking 
the  gain  of  the  PD  controller  k  =  (k\,  •  •  • ,  kn)  to  make  a  Hurwitz  matrix. 

In  practice  we  can  only  attempt  to  guarantee  an  approximation  in  the  form 

1/0*0  -  /0*0I  <  eaf 

and 

130*0  -ff0*0l  <  eag, 

where  eaf,eag  are  the  uniform  upper  bounds  on  the  error  in  approximating 
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fix)  and  g(x)  by 


fix)  —  &f  ^  '  cmn'>prnn 

m,n 

and 


9{x)  —  oig  “I-  'y  ]  dmn'iprnn 

m,n 

respectively.  Here  tjjmn  are  the  wavelet  frames,  and  we  include  (if  and  ag  so  that 
they  can  capture  the  mean  values  of  /  and  g,  if  non-zero.  Such  terms  are  not 
necessary  in  approximations  based  on  RBFN  or  sigmoidal  feed-forward  networks. 

Moreover,  in  practice  an  error  results  from  mis-tuning  of  the  parameters 
&f,ag,cmnidmn.  Let  the  actual  values  of  the  parameters  be  af,ag,cmn,dmn  re¬ 
spectively,  and  let  the  functions  they  constitute  be  /,  g.  Then  in  order  to  consider 
this  error  we  define  the  following: 


6 cmn  —  Crnn  "T  Cm,n 

6 dmn  ~  dmn  T  dm)n 

eaf  =  olj  +  6if 

eag  —  —Oig  +  Oig 

We  have  to  consider  the  resulting  effect  on  the  linearized  equation  and  show 
that  the  presence  of  these  mis-match  error  terms  does  not  lead  to  instability 
during  adaptation  and  learning. 

5.2  Stability 

We  have 

e(t)  =  Ae(t)  +  b  feQ^  +  ecmn  j 

\  m,n  ) 
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(5.3) 


+bu(t)  ( e„,  +  Y,  . ('i'iv.mU) )  +  dist(t)b, 

\  m,n  ) 

where  dist(t)  =  f(x)  -  &f  -  Em,n  Cmn^mn  +  (g(x)  -  dig  -  Em,n  dmn'lpmn )  U. 

First  we  consider  the  case  where  g(x)  is  a  known  constant;  without  loss  of 
generality  we  can  take  g(x)  =  1. 


Case:  g(x)=l 


Then  5.2  becomes 


u  =  Xid(t)  +  upd(t)  -  f(x). 


(5.4) 


For  the  moment,  assuming  that  the  network  has  sufficient  units  (this  point  will 
be  elaborated  later  in  this  chapter)  to  approximate  /  with  the  uniform  error 
bound  eaf  for  all  practical  values  of  the  state  vector,  we  have  that 

\dist(t)\  <  eaf. 


There  exist  different  adaptive  control  approaches  and  corresponding  approaches 
to  ensure  stability.  We  follow  Lypunov  design  methods,  whereby  the  adaptation 
law  derived  is  consistent  with  Lyapunov  stability. 

Since  A  is  strictly  Hurwitz,  by  the  Kalman-Yakubovich-Popov  lemma,  there 
exist  symmetric  and  positive  definite  matrices  P  and  Q  such  that 

PA  +  ATP  =  -Q 


Therefore  we  can  consider  the  Lyapunov  function 


^(e,eCmn,ea/)  =  \eTPe+  (£eL  + 
where  kf  is  a  suitable  positive  gain  value  in  adaptation. 

^(e,eCmn,ea/)  = -|eTQe  +  eTP6  (/ - /)  +  fe  < 

^  x  \m,n 


■af 


T  Cq. j 


(5.5) 
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The  first  term  is  non-positive.  Define  s  =  eTPb ,  the  augmented  error.  The 
second  term  can  be  made  zero  by  considering  a  suitable  adaptation  law  for  cmn 
and  af.  Looking  at  5.5,  we  want  to  cancel  out  the  second  and  third  terms.  Let 
us  consider  the  adaptive  laws, 


C-cmn  —  kfS'lpmn 


and 


eaf  —  kfS. 


By  definition  of  the  parameter  error  terms,  we  have  equivalently, 


Cmn  —  kfS'fin 


and 


df  =  kfs. 


From  5.5  we  arrive  at 


V(e,ecmn,eaf)  =  -~eTQe  +  sef, 

where  ey  is  the  instantaneous  error  inherent  in  approximating  /  by  /,  i.e., 
/  —  /,  and  is  upper  bounded  by  e0/.  The  presence  of  this  term  necessitates 
some  modifications  to  the  control  law  and  adaptive  laws.  Let  us  consider  a 
modification  to  the  control  law  by  adding  a  new  term  to  u  in  5.4  in  the  form 
uA  =  —  sgn(s)Af  where  A /  is  a  positive  value  chosen  such  that  Ay  >  eaf.  From 
5.4,  the  new  control  input  is  given  by 

u  =  Adi1)  +  upd(t )  ~  f(x )  +  “a- 
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Then 


V{e)  =  — ^er<5e  +  s(ej  —  sgn(s)Af). 

We  see  that  the  second  term  is  forced  to  be  non-positive  by  means  of  the  fact 
\ef\  <  eaf  <  A/.  Hence,  initial  boundedness  of  the  state  vector  and  the  param¬ 
eters  implies  that  they  remain  bounded  for  all  time.  By  a  simple  application  of 
Barbalat’s  lemma  (see  A. 4),  asymptotic  convergence  of  the  tracking  error  vector 
is  established. 

Since  we  allow  for  the  possibility  of  large  errors  in  approximation,  it  is  de¬ 
sirable  to  have  a  reasonably  large  A/.  We  use  a  dead-zone  d,  i.e.,  we  adapt  cmn 
and  ct/  only  if  eTPe  >  d 2  and  cmn,6tf  are  0  otherwise. 


Figure  5.1:  The  Control  Architecture 
Case:  g(x)  is  unknown 

The  'ipmn  are  normalized  to  unit  maximum  amplitude  rather  than  to  unit 
norm. 
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Therefore 


\f\  \af\  T  lC,7»n|) 
m,n 

and 

|i?l  ^  |asl  T  ^2  I  dmn  |  • 

m,n 

Thus  /  and  g  are  upper  bounded;  also  g  is  ensured  to  be  invertible  by  enforcing 
a  lower  bound  to  the  network  output  during  adaptation. 

Hence  the  control  law  is, 

u(t)  =  g(x)~l(xnld(t)  +  upd(t)  -  f(x))  (5.6) 


To  ensure  that  the  adaptation  laws  are  consistent  with  Lyapunov  stability, 
consider  the  Lypunov  function 

y{e,eCmn,edmn)  =  -eTPe+- —  ^ e2Cmn  +  ^ e2dmn  +  e^  , 

where  kt  and  k„  are  suitable  positive  adaptation  gains. 


V{e,eCmn,edmn)  =  -~eTQe  +  eTPb(j  -  /  +  u  (g  -  g 


T  I  (  ecranCcro„  +  ea/Cay  ) 
"7  \m,n  ) 

f  T  (  edmn^dmn  T  eag&ag  ) 
Kg  \m,n  / 


(5.7) 


This  suggests  that  we  can  attempt  the  following  adaptation  laws  (with  s  as 
as  the  augmented  error  defined  earlier): 


Crrrn 


kfSlj)mm 


aj  =  kfS, 


dmn  —  kqUS1pr 
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and 


Qlg  =  kgUS. 

The  presence  of  u  in  the  last  two  laws  should  be  noted. 

Upon  substituting  these  laws  in  5.7  we  get 

V(e)  —  ~eTQe  +  se/  4-  sueg, 

£j 

where  ef  and  eg  are  the  disturbances  caused  by  the  inherent  error  in  approxi¬ 
mating  /  by  /,  i.e.,  e/  =  /  —  /  and  g  by  g,  i.e.,  g  —  g  respectively.  The  presence 
of  these  terms  again  lead  to  modifications  similar  to  the  case  of  g(x)  =  1. 
Consider  the  additional  control  term 

From  5.6,  the  new  input  u  is  given  by 


u  =  Mo  +  UA j 


where 

«  =  0(®)_1(smW  +  upd{t )  -  /(«))• 

Then  the  u  appearing  in  the  adaptive  laws  dmn  and  ag  will  be  changed  to  u0, 
i.e., 

dmn  ~  kgUQSIpmn, 

and 

Qlg  -  kgUQS. 

Then 

V(e)  =  +  s  ^ef  -  sgn(s)g(x)^-Sj  +  su 


eg  -  sgn(s)g(x)‘ 
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By  making  \ef  \  <  eaf  <  A/  and  \eg\  <  eag  <  Ag  we  see  that  V(e )  is  ensured 
to  be  non-positive.  This  ensures  boundedness  of  the  state  vector  and  the  pa¬ 
rameters.  Again,  Barbalat’s  lemma  can  be  used  to  show  (see  appendix  A. 4)  that 
asymptotic  tracking  is  established. 

5.3  Effects  Due  to  Dynamic  Model  Selection 

If  we  are  to  attempt  on-line  learning  of  the  structure  in  addition  to  on-line 
adaptation  of  the  weights,  it  is  imperative  that  we  take  into  account  the  effects 
of  incomplete  set  of  indices  (m,  n)  leading  to  violation  of  the  upper  bounds  on 
approximation  given  by  eaf  and  eag.  This  problem  can  be  solved  by  taking  A f 
and  Ag  initially  large  and  then  gradually  reducing  them  as  a  function  of  the  error 
vector  e.  This  has  the  merit  of  ensuring  that  energy  is  not  wasted  unnecessarily. 
Such  as  an  adjustment  is  made  possible  by  the  fact  that  although  the  error  vector 
in  itself  does  not  provide  any  information  on  the  dilation-translation  indices  to 
be  learnt,  this  vector  can  be  used  in  combination  with  the  state  vector  to  give 
useful  information  on  the  proximity  to  existing  translations,  and  dilations. 

A  crucial  limitation  of  any  fixed  or  adaptive  control  schemes  without  a  ‘learn¬ 
ing’  component  is  that  unmodelled  dynamics  cannot  be  accounted  for.  For  in¬ 
stance,  in  the  case  of  robotic  manipulator  control,  the  friction  terms  are  difficult 
to  model  accurately,  whereas  the  inertia  terms  are  well  modelled.  Thus  both 
adaptive  and  fixed  control  strategies  that  are  based  on  an  assumed  plant  model 
(with  either  known  or  unknown  functions)  are  difficult  to  control.  This  has 
led  some  researchers  to  propose  learning  control  schemes  when  the  action  to  be 
performed  is  repetitive.  In  [6],  Craig  proposed  a  linear  filter  based  learning  in 
combination  with  a  fixed  controller  on  the  assumption  that  the  fixed  controller 
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can  provide  sufficient  control  so  that  the  resulting  error  without  the  learning 
component  is  small.  Such  schemes  have  limited  capability  and  cannot  be  used 
in  more  general  plants.  Neural  networks  can  be  used  in  place  of  the  linear  fil¬ 
ter.  Such  an  scheme  can  in  principle  combine  adaptation  and  learning.  In  using 
non-local  networks,  all  weights  get  adjusted  at  each  novel  situation  (in  the  worst 
case  at  each  iteration)  and  learning  is  essentially  forgotten.  What  is  required  is 
a  scheme  that  captures  the  underlying  function  and  adapts  to  novel  situations 
as  well;  i.e.,  a  scheme  that  effectively  adapts  to  novel  situations  without  erasing 
the  learnt  portion.  To  some  extent,  local  learning  has  the  capability  of  achieving 
this  since  at  every  iteration  only  those  weights  corresponding  to  a  local  region 
of  the  input  space  get  adjusted.  However,  after  a  small  period  of  operation, 
effective  learning  may  be  lost  since  adaptation  could  have  occurred  virtually  in 
the  whole  range  of  operation  and  erased  any  learning.  This  leads  us  to  consider 
schemes  that  combine  adaptation  and  learning  in  the  sense  described  above.  Our 
proposed  scheme  is  indicated  for  further  work  in  figure  5.2.  Some  aspetcs  of  this 
can  be  seen  in  the  work  of  Farrel,  et  al.  [10]. 
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Chapter  6 


Simulation  Results  and  Conclusions 

In  this  chapter  we  present  some  simulation  results.  These  simulations  should 
not  be  interpreted  as  final;  rather  we  present  these  as  an  indication  of  the  merits 
of  the  procedures  we  had  explained  in  earlier  chapters,  and  will  be  improved  in 
future  work. 

6.1  Simulation  for  learning  algorithms 

6.1.1  One-dimensional  problems 

The  following  function  [34]  was  used. 

-2.186*  -  12.864  -10  <  *  <  -2 

f(x)  =  <  4.246*  -2  <  *  <  0 

10e-o.o5x-o.5  sin  [(o.03*  +  0.7)]  0  <  *  <  10 

These  results  in  figures  6.2,  6.3  clearly  show  the  success  of  our  methodology  in 
the  one  dimensional  case.  The  performance  with  22  units  in  this  case  shows  how 
our  learning  algorithm  automatically  adds  nodes  where  more  nodes  are  necessary 
(high-frequency  variations),  and  its  significance  lies  in  the  fact  that  we  use  an 
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on-line  sequential  method  (i.e.,  data  are  presented  only  once  sequentially,  and 
no  non-linear  optimization  procedures  used).  As  is  expected  in  any  sequential 
method,  we  needed  roughly  eight  to  ten  times  more  data  than  off-line  methods. 
This  should  not  be  a  problem  since  each  incoming  data  point  can  be  discarded 
after  presentation  to  the  network.  Figure  6.4  shows  the  network  obtained  for 
a  reduced  absolute  error.  The  required  generalization  performance  was  very 
adequate  in  this  case.  This  resulted  in  39  nodes  and  a  MSE  of  0.005. 

In  another  simulation,  the  Hermite  Function  f(x )  =  1.1(1— x+2x2)  exp{  — 
is  used  as  in  [17]. 


6.2  Simulations  for  Adaptive  Control 

For  simulating  the  plant  models  considered  in  chapter  5,  we  consider  the  follow¬ 
ing  two-dimensional  plant  function  f(x)  as  in  [30] 


{  sin^Xx)  \  2 

V  *ra!i  )  ’ 

and  g(x)  =  1. 

The  plant  output  is  taken  as  y  =  X\. 

The  fixed  network  structure  was  attempted.  A  total  of  35  x  35  nodes  were 
required  for  /  and  g.  The  simulations  were  run  using  C. 

The  following  values  are  selected:  k\  =  1,  &2  =  20.  Then  we  obtain 
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satisfies  the  conditions  inherent  in  Kalman- Yakubovich-Popov  lemma.  Hence 


0/ 


and  s  =  eTPb  =  e\  +  1e\. 

Figures  6.8,  6.9,  6.10  show  the  tracking  performance. 

6.3  Conclusions 

The  use  of  multi-dimensional  wavelet  theory  in  network  construction  was  estab¬ 
lished,  and  the  method  of  using  tensor-product  wavelets  was  theoretically  and 
experimentally  studied.  A  methodology  for  building  the  network  sequentially 
on-line  was  developed  and  shown  to  give  good  results  in  one-dimension.  In  high 
dimensional  problems,  for  the  same  degree  of  success,  we  have  to  work  with  an 
inordinate  amount  of  data,  but  we  emphasize  that  this  should  not  cause  concern 
since  the  methodology  is  on-line. 

Our  results  demonstrate  the  feasibility  of  using  wavelets  as  network  functions 
in  adaptive  control  situations.  While  having  certain  similarities  with  [30] in  the 
overall  methodology  used,  our  scheme  differs  from  their  methods  in  a  number 
of  ways.  We  do  not  use  the  sliding  scheme  and  we  implement  the  network  for 
g  rather  than  g~l.  Using  wavelet  network  over  the  Gaussian  network  does  re¬ 
duce  the  number  of  nodes,  but  not  sufficiently  enough  to  make  local  learning 
practicable  for  high  dimensional  problems.  Feedforward  neural  networks  used 
by  other  researchers  (see  for  e.g.,  [3])  result  in  local  stability,  with  the  condi¬ 
tion  that  initial  errors  be  small.  Our  method  has  no  such  restrictions  (only 
boundedness  is  required),  but  this  generally  holds  for  all  networks  that  have 


56 


a  linear-in-the  parameter  structure.  The  non-local  approximation  property  of 
feed-forward  networks  enables  fewer  network  units  to  be  used. 

6.4  Future  Directions 

Developing  sequentially  growing  more  compact  networks  needs  to  be  studied 
further  from  a  statistical  framework.  Research  in  the  direction  of  obtaining 
sample  complexity  estimates  and  generalization  bounds  for  such  compact  net¬ 
works  is  indicated.  Recent  work  by  Niyogi  and  Girosi  [22]  contains  good  results 
for  RBFNs,  etc,  and  the  references  contained  therein  are  good  sources  on  this 
problem.  The  simple  idea  of  applying  a  threshold  to  the  wavelet  coefficients 
is  given  more  theoretical  analysis  by  Don  oho  and  Johnstone  [9]  for  orthogonal 
wavelets  in  the  context  of  estimation  theory  and  it  should  prove  worthwhile  to 
investigate  the  underlying  connections  between  general  (non-orthogonal)  wavelet 
networks  and  non-parametric  estimation  further.  Also,  it  will  be  useful  to  ad¬ 
vance  non-uniform  sampling  techniques  to  irregular  lattice-based  wavelet  theory. 

For  control,  further  experimentation  is  needed  on  the  practicality  of  imple¬ 
menting  the  adaptive/learning  approach  to  complex  systems.  In  a  recent  pa¬ 
per  Narendra,  et  al.  [21]  have  suggested  broadening  the  simulations  to  multi- 
variable  systems,  and  report  that  attempting  simultaneous  identification  and 
control  leads  to  unstable  results.  Moreover,  theoretical  results  on  such  problems 
as  “persistency  of  excitation”  in  the  adaptive/learning  case,  and  implementing 
control  schemes  for  more  general  plant  models  (for  example,  using  recurrent  net¬ 
works)  are  also  desirable.  Considering  that  a  plethora  of  existing  methods  (var¬ 
ious  neural  schemes,  CMAC,  RBFN,WN,  fuzzy  systems,  other  hybrid  learning 
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schemes)  are  scattered  in  the  literature  without  any  clear  indication  of  compara¬ 
tive  merits,  comparative  theoretical  and  experimental  studies  to  unambiguously 
determine  under  what  general  conditions  one  method  outperforms  another  are 
also  necessary. 
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Figure  6.2:  The  Sequentially  Learnt  Structure  of  The  Network  When  The  Max¬ 
imum  Absolute  Generalization  Error  is  0.6 
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Figure  6.3:  The  Generalization  Performance  on  a  Test  Set  of  200  Data.  yd(‘+’) 
and  ynet(‘o’)  are  the  Actual  Function  and  Network  Output  Respectively 


Space 


Figure  6.4:  The  Sequentially  Learnt  Structure  of  the  Network  When  the  Maxi¬ 
mum  Absolute  Generalization  Error  is  0.2 
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f(x)  actual  (V)  and 


Figure  6.6:  The  Generalization  Performance:  *+’  and  ‘o’  Mark  the  Actual  Func¬ 
tion  and  Network  Output  Respectively 
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x  1 0" 3  The  T rackin9  Error  with  Ka  =  50;  kl  =1 ;  k2  =20;  and  dO  =  0.0005 


Figure  6.8:  The  Tracking  Error 
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xl ,  desired  and  actual 


The  Adaptive  Tracking  of  xl  with  Ka  =  50;  kl  =1 ;  k2  =20 


Figure  6.9:  The  Tracking  Performance 


65 


The  adaptive  tracking  of  x2 


Figure  6.10:  The  Tracking  Performance 
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Appendix  A 


A.l  Proof  of  theorem  1 


As  our  proof  closely  follows  Daubechies’  scheme  for  1-D  (see  [7]  and  the  sec¬ 
tion  3.3.2.  of  [8]),  we  expand  on  these  results  to  show  the  validity  of  the  condi¬ 
tions  given  to  many  dimensions. 

First,  we  need  the  following  generalization  of  the  Poisson  formula: 


Se<c‘T*=(|)"ns^ 

fee zn  v'-//  j=ik3ez 


X  --k 
•*'3  (j  N3 


where  i  is  the  imaginary  unity  and  C  is  any  real  non  zero  constant.  It  can 
be  verified  by  simple  computations. 

By  applying  this  generalized  Poisson  formula  and  the  Parseval’s  theorem, 
straight  forward  computations  give 

'27T'n 


El(^,fc,/)|2  =  (-T-)  E  f  du  $(alu;)j2  ■  |/(w)|2  +  A, 

i,k  v  0  '  i  J 


where 


A  =  (?)  ?  S  /  dw  ^a‘u)  *  (“'w  -  Tk)  /(“) 


l  k+ 0 


and  k  /  0  means  at  least  one  component  of  k  is  not  zero. 
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By  applying  the  Cauchy-Schwarz  inequality,  we  get 


...  /2tt\"  ^  .  /27T ,  \  ^  /  2tt  r\ 

iM  <  (T)  E  Kt‘V  -t‘ 


k^O 

where  /?(•)  is  as  defined  in  (2.5).  This  inequality  together  with  the  three 
conditions  of  the  theorem  gives 


f)n\mi'p’a)-^HTk)l3{-Tk) 


<  (y)  {M^,a)  +  Y, 


kjt 0  L 


27T  , 


^Tk)0{-Tk 


<  52\(Ak,f)¥ 

l,k 

2ir , 


The  only  thing  left  is  to  verify  that  condition  (2.4)  ensures  the  convergence 
of  the  multi-indexed  series 


and  implies  that  the  sum  tends  to  zero  when  b  — >  0,  so  that  the  coefficients  of 
| 2  in  the  above  inequalities  are  strictly  positive  for  small  enough  b. 

By  (2.4)  we  have, 

P(y)  <Ce(l+r)Trj) 


_  n(l-fe) 
2 


This  leads  to 
'2tt 


E 

fc^O  L 


2tt 


<  cf 


W  h 


+  |^l|2  4 - b  \K 


l+£ 


Considering  the  inequality 

(c  4-  \ki\2  +  •  •  •  +  |/Cn|2)  >  (c  +  l^i|2)  (c  +  I ^2 12)  •  •  •  (C  +  \kn\2^j 


where  C  is  any  positive  constant,  we  see  that  the  series  in  k  converges.  Moreover, 
as  b  — »  0,  this  sum  tends  to  zero.  The  proof  of  the  theorem  is  thus  established. 
□ 
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A. 2  Proof  of  Theorem  2 


As  in  the  proof  of  theorem  1,  we  use  the  Poisson  formula  in  n  dimensions,  and 
the  Parseval’s  Theorem  in  relation  to  Fourier  Transforms.  The  steps  involved 
are  similar  to  the  proof  of  theorem  1  as  shown  below. 

We  arrive  at 

EKVh\fc>/)|2  =  (27r  detT"1)7^  f  duj\7p(D-jU))\2\f(uj)\2  +  A 

j,k  j 

where 

A  =  (2tt  detT-1)”^  £  f  dco^D^co^iD^w  -  2^-^) 

j  \k\^oJ 

W)f{u  -  27 rD-jT-'k) 

The  third  condition  of  the  theorem  implies  the  decay  of  (3  as 

P(v)  <  (1  +  rfri)-^1  ■  Ct. 

Hence 


The  multi-indexed  series  converges  as  in  theorem  1.  Moreover,  it  is  easily 
seen  that  when  bi,i  =  1,  ■  •  •  ,n  — »  0,  the  sum  — »  0;  and  the  limit  on  b  is  given  by 

bc  =  inf  <  b\m(ip,  a)  <  ^  \j3{2'KT~1k)f3{— 27rT_1fc)j 2 
l  l*l^o 

Again,  the  bounds  and  inequalities  are  considered  element-wise. 

This  completes  the  proof  of  the  theorem.  □ 
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A. 3  The  relations  required  in  section  3.4.1 


Let  j  denote  the  index  of  one  dimension,  i.e.,  j  e  {1  Consider  the 

translations  Xji  for  i  e  {1,  •  •  • ,  n  —  1}  and  Xjn  at  the  same  dilation  level  rrij  =  m. 
We  have 


Aj  =  (l  -  A2™  (Xj  -  Xjif) 


1{.X3  xJt) 


Then 


'ifiij  j  0nj  ^ 


/oo  / 

-oo  “  °2m  (Xj  ~  Xji^  ) 


e 


2 


(i  -  d- (Xj 


“ 2m{x3-xin )2 

e  2  d  Xj. 


By  writing  out  this  expression,  and  using  the  fact  that 

Ill'll2  =  \^a~m 

we  arrive  at  the  following  result. 


a2md2 


"00  >  0ni  —  e  4 


1  (l.O  -  a2rad2  +  ^a4mef 


where 


dj  —  Xji  Xjn- 


Therefore,  zero-crossings  occur  at 


(omd  -  6.0)2  =  24.0, 


i.e.,  |d|  =  1.0493*a  m,or3.3014*a  m.  Working  within  the  framework  of  a  regular 
lattice,  we  may  choose  to  have 

|d|  =  a~mb. 
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The  contribution  due  to  e^-0.25  aA{2m}dA2}  aA{4m}dA4 


Figure  A.l:  cos(aj)  in  One  Dimension 


Then  b  =  1.0493  is  a  sensible  choice,  consistent  with  the  conditions  on  frames. 

Figure  A.l  shows  the  effect  of  the  last  term  in  the  above  expression  for 
<  'tpij , ipnj  >•  The  graphical  form  of  <  i >  is  essentially  similar  to  the 
Mexican  hat  function  (mother  wavelet)  for  small  values  of  d  (until  the  first  zero¬ 
crossing  point)  and  begins  to  differ  from  it  for  larger  values  of  d. 

More  generally,  when  the  dilation  levels  are  also  different,  i.e.,  tol  =  am%  and 


A. 4  Barbalat’s  Lemma 

If  f(t )  is  a  uniformly  continuous  function,  such  that  lim^o/o  f(T)dT  exists  and 
is  finite,  then  f(t )  — >  0  as  t  — >  oo. 
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